Cs 6604: Data Mining 2 Relational Apriori 2.1 Queries
نویسندگان
چکیده
1 Overview In the previous several lectures, we mainly discussed algorithms for ILP, i.e., supervised learning of rela-tional predicates. In this class, we focus on unsupervised algorithms for learning relational patterns. In particular, we look at algorithms that are the relational equivalent of traditional enumerative search algorithms. Take, for instance, the most famous algorithm in the association rules literature, i.e., Apriori. Its relational twin, call it Relational Apriori, is its generalization to first-order logic. To recall the details of Apriori, one of the properties it exploits is anti-monotonicity, namely if a set of items X does not have support, a set Y ⊇ X cannot have support. Suppose we have a relational database consisting of three tables, i.e., " Customer " , " Parent " , and " Buys " as follows: Customer ID allen bill carol diana Parent ID Child ID allen bill allen carol bill zoe carol diana Customer ID Item allen wine bill candy bill pizza diana pizza Table 1: Relation database RD with customer information The type of pattern we try to mine can be viewed as queries. We say that a query-pattern matches the database if the set of tuples returned by the query is not empty. Take an ordinary SQL query for example, if we need to inquire the relational database for the customer who buys its child " candy " : Let us convert this query into predicate logic form. Thus, the above SQL query is translated into the following logical clause form, Q 1 .
منابع مشابه
Association Rules in the Relational Calculus
One of the most utilized data mining tasks is the search for association rules. Association rules represent significant relationships between items in transactions. We extend the concept of association rule to represent a much broader class of associations, which we refer to as entity-relationship rules. Semantically, entity-relationship rules express associations between properties of related ...
متن کاملDWMiner: A Tool for Mining Frequent Item Sets Efficiently in Data Warehouses
This work presents DWMiner, an association rules efficient mining tool to process data directly over a relational DBMS data warehouse. DWMiner executes the Apriori algorithm as SQL queries in parallel, using a database PC Cluster middleware developed for SQL query optimization in OLAP applications. DWMiner combines intraand inter-query parallelism in order to reduce the total time needed to fin...
متن کاملMulti-Relational Data Mining
An important aspect of data mining algorithms and systems is that they should scale well to large databases. A consequence of this is that most data mining tools are based on machine learning algorithms that work on data in attribute-value format. Experience has proven that such ’single-table’ mining algorithms indeed scale well. The downside of this format is, however, that more complex patter...
متن کاملEfficient Mining for Association Rules with Relational Database Systems
With the tremendous growth of large-scale data repositories, a need for integrating the exploratory techniques of data mining with the capabilities of relational systems to efficiently handle large volumes of data has now risen. In this paper, we look at the performance of the most prevalent association rule mining algorithm Apriori, with IBM’s DB2 Universal Database system. We show that a mult...
متن کاملEvaluation of Common Counting Method for Concurrent Data Mining Queries
Data mining queries are often submitted concurrently to the data mining system. The data mining system should take advantage of overlapping of the mined datasets. In this paper we focus on frequent itemset mining and we discuss and experimentally evaluate the implementation of the Common Counting method on top of the Apriori algorithm. The general idea of Common Counting is to reduce the number...
متن کامل